Plasma proteome profiling discovers novel proteins associated with non‐alcoholic fatty liver disease¶

Lili Niu, Philipp E Geyer, Nicolai J Wewer Albrechtsen, Lise L Gluud, Alberto Santos, Sophia Doll, Peter V Treit, Jens J Holst, Filip K Knop, Tina Vilsbøll, Anders Junker, Stephan Sachs, Kerstin Stemmer, Timo D Müller, Matthias H Tschöp, Susanna M Hofmann, Matthias Mann¶

Abstract¶

Non‐alcoholic fatty liver disease (NAFLD) affects 25% of the population and can progress to cirrhosis with limited treatment options. As the liver secretes most of the blood plasma proteins, liver disease may affect the plasma proteome. Plasma proteome profiling of 48 patients with and without cirrhosis or NAFLD revealed six statistically significantly changing proteins (ALDOB, APOM, LGALS3BP, PIGR, VTN, and AFM), two of which are already linked to liver disease. Polymeric immunoglobulin receptor (PIGR) was significantly elevated in both cohorts by 170% in NAFLD and 298% in cirrhosis and was further validated in mouse models. Furthermore, a global correlation map of clinical and proteomic data strongly associated DPP4, ANPEP, TGFBI, PIGR, and APOE with NAFLD and cirrhosis. The prominent diabetic drug target DPP4 is an aminopeptidase like ANPEP, ENPEP, and LAP3, all of which are up‐regulated in the human or mouse data. Furthermore, ANPEP and TGFBI have potential roles in extracellular matrix remodeling in fibrosis. Thus, plasma proteome profiling can identify potential biomarkers and drug targets in liver disease.

Notebook¶

This notebook is a step by step guide on how to reproduce te analyses described in the Clinical Knowledge Graph article. The analyses described are performed in an automated manner following a sequence of steps defined in the configuration file (report_manager/config/proteomics.yml). Here, we use CKG’s API to show this analytical workflow.

[ ]:

[ ]:

[1]:

import os
import pandas as pd

import ckg.ckg_utils as ckg_utils

from ckg.analytics_core.analytics import analytics
from ckg.analytics_core.viz import viz

from ckg.report_manager import project, knowledge
from ckg.report_manager.dataset import  ProteomicsDataset, ClinicalDataset

from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
%matplotlib inline
init_notebook_mode(connected=True)

C:\Users\sande\.conda\envs\ckgenv\lib\site-packages\outdated\utils.py:18: OutdatedPackageWarning: The package outdated is out of date. Your version is 0.2.0, the latest is 0.2.1.
Set the environment variable OUTDATED_IGNORE=1 to disable these warnings.
  **kwargs
C:\Users\sande\.conda\envs\ckgenv\lib\site-packages\outdated\utils.py:18: OutdatedPackageWarning: The package pingouin is out of date. Your version is 0.3.10, the latest is 0.3.11.
Set the environment variable OUTDATED_IGNORE=1 to disable these warnings.
  **kwargs

WGCNA functions will not work. Module Rpy2 not installed.
R functions will not work. Module Rpy2 not installed.

[2]:

analysis_dir = '../../../../data/tmp/Niu2019'
ckg_utils.checkDirectory(analysis_dir)

Load Data¶

This study is part of the datasets provided within CKG. The project data has already been loaded into CKG. The pipeline starts by building this project based on the types of datasets available (Clinical, Proteomic, etc.). We can access the data by creating a project object with the right project ifentifier and CKG will retrieve all the available data for us using the query_data() function.

[3]:

p = project.Project(identifier="P0000001", datasets={}, knowledge=None, report={}, configuration_files={})
project_info = p.query_data()

[4]:

project_info

[4]:

{'attributes':   acronym           data_types description identifier  \
 0   NAFLD  proteomics|clinical        None   P0000014

                                 name  number_subjects responsible status
 0  Non-alcoholic fatty liver disease               48        test   None  ,
 'similarity':                              current current_id  \
 0  Non-alcoholic fatty liver disease   P0000014
 1  Non-alcoholic fatty liver disease   P0000014
 2  Non-alcoholic fatty liver disease   P0000014
 3  Non-alcoholic fatty liver disease   P0000014
 4  Non-alcoholic fatty liver disease   P0000014
 5  Non-alcoholic fatty liver disease   P0000014

                                          description              other  \
 0  The altered molecular proteins and pathways in...  Covid-19 - plasma
 1                                               None           Melanoma
 2                                               None        QUOD Kidney
 3                                               None            DorDilu
 4                                               None    Melanoma-DIA-NN
 5                                               None         test mztab

    other_id responsible  similarity_pearson
 0  P0000017        test            0.100648
 1  P0000012        test           -0.025570
 2  P0000016        test           -0.039196
 3  P0000019        test           -0.064045
 4  P0000013        test           -0.070746
 5  P0000015        test           -0.233736  ,
 'overlap':         from  intersection                      project1_name  project1_total  \
 0   P0000013          2955                    Melanoma-DIA-NN            4764
 1   P0000013          3729                    Melanoma-DIA-NN            4764
 2   P0000016          3046                        QUOD Kidney            6638
 3   P0000012          2070                           Melanoma            3445
 4   P0000012          1666                           Melanoma            3445
 5   P0000012           914                           Melanoma            3445
 6   P0000012          1826                           Melanoma            3445
 7   P0000014           867  Non-alcoholic fatty liver disease            1927
 8   P0000017           729                  Covid-19 - plasma            3868
 9   P0000014           892  Non-alcoholic fatty liver disease            1927
 10  P0000014          1044  Non-alcoholic fatty liver disease            1927
 11  P0000013           767                    Melanoma-DIA-NN            4764
 12  P0000012           562                           Melanoma            3445
 13  P0000014           325  Non-alcoholic fatty liver disease            1927
 14  P0000016           690                        QUOD Kidney            6638
 15  P0000014             4  Non-alcoholic fatty liver disease            1927
 16  P0000015             3                         test mztab               4
 17  P0000012             4                           Melanoma            3445
 18  P0000015             4                         test mztab               4
 19  P0000013             4                    Melanoma-DIA-NN            4764
 20  P0000015             3                         test mztab               4

     project1_unique                      project2_name  project2_total  \
 0              1809                  Covid-19 - plasma            3868
 1              1035                        QUOD Kidney            6638
 2              3592                  Covid-19 - plasma            3868
 3              1375                    Melanoma-DIA-NN            4764
 4              1779                  Covid-19 - plasma            3868
 5              2531                            DorDilu            1531
 6              1619                        QUOD Kidney            6638
 7              1060                  Covid-19 - plasma            3868
 8              3139                            DorDilu            1531
 9              1035                    Melanoma-DIA-NN            4764
 10              883                        QUOD Kidney            6638
 11             3997                            DorDilu            1531
 12             2883  Non-alcoholic fatty liver disease            1927
 13             1602                            DorDilu            1531
 14             5948                            DorDilu            1531
 15             1923                         test mztab               4
 16                1                            DorDilu            1531
 17             3441                         test mztab               4
 18                0                  Covid-19 - plasma            3868
 19             4760                         test mztab               4
 20                1                        QUOD Kidney            6638

     project2_unique  similarity        to
 0               913    0.520521  P0000017
 1              2909    0.485990  P0000016
 2               822    0.408311  P0000017
 3              2694    0.337188  P0000013
 4              2202    0.295024  P0000017
 5               617    0.225012  P0000019
 6              4812    0.221146  P0000016
 7              3001    0.175933  P0000017
 8               802    0.156103  P0000019
 9              3872    0.153820  P0000013
 10             5594    0.138811  P0000016
 11              764    0.138748  P0000019
 12             1365    0.116840  P0000014
 13             1206    0.103734  P0000019
 14              841    0.092258  P0000019
 15                0    0.002076  P0000015
 16             1528    0.001958  P0000019
 17                0    0.001161  P0000015
 18             3864    0.001034  P0000017
 19                0    0.000840  P0000015
 20             6635    0.000452  P0000016  }

Here, project_info contains all the information (attributes) for the chosen project: name, acronym, description, etc, as well as overlap and similarity with other projects in CKG’s database. We save these attributes and project similarities in the project object p.

[5]:

p.set_attributes(project_info)
p.get_similar_projects(project_info)
p.get_projects_overlap(project_info)

Now it is time to create a dataset object for each data type in the project and store them as a dictionary of datasets in p. We create the datasets without specifying any configuration. When CKG runs in an automated manner, the configuration files in report_manager/config define how each dataset is analysed and which parameters should be used by default. Here, we will run the analyses of the protoemics data step by step.

[6]:

for data_type in p.data_types:
    dataset = None
    configuration = None
    if data_type == "proteomics":
        dataset = ProteomicsDataset(p.identifier, data={}, configuration=configuration, analysis_queries={}, report=None)
    elif data_type == "clinical":
        dataset = ClinicalDataset(p.identifier, data={}, configuration=configuration, analysis_queries={}, report=None)

    if dataset is not None:
        dataset.generate_dataset()
        p.update_dataset({data_type: dataset})

We can now see that our project p has two datasets - clinical and proteomics. These datasets will contain already serveral dataframes: - original: data as it was collected/generated, with no processing done.

processed: processed data after normalization, imputation, batch effect correction, etc.
specific dataframes: depending on the type of data extra dataframes are generated, for instance, a list of clinical variables or annotation from Gene Ontology or other databases

[7]:

p.list_datasets()

[7]:

dict_keys(['proteomics', 'clinical'])

[8]:

clinical_dataset = p.get_dataset('clinical')
clinical_dataset.list_dataframes()

[8]:

['clinical variables', 'original', 'processed']

[9]:

proteomics_dataset = p.get_dataset('proteomics')
proteomics_dataset.list_dataframes()

[9]:

['number of proteins',
 'number of peptides',
 'number of modified proteins',
 'protein biomarkers',
 'tissue qcmarkers',
 'metadata',
 'protein pathway annotation',
 'protein go annotation',
 'original',
 'processed']

[10]:

proteomics_dataset.get_dataframe('processed').head()

[10]:

identifier	group	sample	subject	A2M~P01023	A30~A2MYE2	ABI3BP~Q7Z7G0	ACE~P12821	ACTB~P60709	ACTN1~P12814	ADA2~Q9NZK5	...	VCAM1~P19320	VCL~P18206	VH6DJ~A2N0T4	VIM~P08670	VK3~A2N2F4	VNN1~O95497	VTN~P04004	VWF~P04275	YWHAZ~P63104	scFv~Q65ZC9
0	Cirrhosis	69_F8	69	38.005564	28.173504	21.631230	22.251041	27.090330	25.039968	23.442151	...	26.016356	26.337731	31.159485	24.178889	25.835908	22.480055	32.815815	28.922779	22.347244	27.788928
1	Cirrhosis	70_F9	70	37.309118	27.981907	27.342062	23.847270	27.461155	25.896268	23.754503	...	27.343842	25.535996	31.994997	23.709777	25.004889	23.852908	32.722121	29.881279	22.141285	26.869972
2	Cirrhosis	71_F10	71	37.384952	28.857627	21.080035	22.863630	27.929764	24.295225	23.359443	...	26.353869	25.858635	30.139559	23.599064	26.271650	24.232132	32.755752	29.444625	21.972598	28.069328
3	Cirrhosis	72_F11	72	38.417225	28.978380	25.501910	22.992774	27.152479	25.231288	23.701340	...	26.959475	26.531017	31.977294	24.179076	25.929200	24.269047	32.714014	29.397176	22.216971	28.170209
4	Cirrhosis	73_F12	73	37.471303	28.748744	20.200498	21.326143	27.537048	22.392992	22.406264	...	26.473269	26.355535	30.485582	23.865224	26.701340	20.953141	32.722691	28.540895	18.630532	28.612280

5 rows × 512 columns

Processing of Proteomics Dataset¶

To show how to go from the original data to the processed dataframe, we will show what function is used and how the parameters are defined:

df: long-format pandas dataframe with columns ‘group’, ‘sample’, ‘subject’, ‘identifier’ (protein), ‘name’ (gene) and ‘LFQ_intensity’.
index_cols: column labels to be be kept as index identifiers.
drop_cols: column labels to be dropped from the dataframe.
group: column label containing group identifiers.
identifier: column label containing feature identifiers (i.e protein identifiers).
extra_identifier: column label containing additional protein identifiers (e.g. gene names).
filter_samples: if True filter samples with valid values below percentage (filter_samples_percent).
filter_samples_percent: defines the maximum percentage of missing values allowed in a sample.
imputation: if True performs imputation of missing values.
imputation_method: method for missing values imputation (‘KNN’, ‘distribuition’, or ‘mixed’)
missing_method: defines which expression rows are counted to determine if a column has enough valid values to survive the filtering process.
missing_per_group: if True filter proteins based on valid values per group; if False filter across all samples.
missing_max: maximum ratio of missing/valid values to be filtered.
min_valid: minimum number of valid values to be filtered.
value_col: column label containing expression values.
shift: when using distribution imputation, the down-shift
nstd: when using distribution imputation, the width of the distribution
knn_cutoff: when using KNN imputation, the minimum percentage of valid values for which to use KNN imputation (i.e. 0.6 -> if 60% valid values use KNN, otherwise MinProb)
normalize: whether or not to normalize the data
normalization_method: method to be used to normalize the data (‘median’, ‘quantile’, ‘linear’, ‘zscore’, ‘median_polish’) (only with normalize=True)
normalize_group: normalize per group or not (only with normalize=True)
normalize_by: whether the normalization should be done by ‘features’ (columns) or ‘samples’ (rows) (only with normalize=True)

[11]:

original_data = proteomics_dataset.get_dataframe('original')
original_data.head()

[11]:

	LFQ_intensity	batch	group	identifier	name	sample	subject
0	21.593090	None	NAFLD+T2DM	M0R009	A1BG	63_F2	63
1	37.316049	None	Cirrhosis	P01023	A2M	77_G4	77
2	37.309118	None	Cirrhosis	P01023	A2M	70_F9	70
3	38.005564	None	Cirrhosis	P01023	A2M	69_F8	69
4	37.957887	None	Cirrhosis	P01023	A2M	76_G3	76

[12]:

processed_data = analytics.get_proteomics_measurements_ready(df=original_data, index_cols=['subject', 'sample', 'group'],
                                                             imputation=True,
                                                             imputation_method="distribution", missing_method="percentage",
                                                             extra_identifier="name",
                                                             filter_samples=False,
                                                             missing_per_group=True, missing_max=0.3,
                                                             shift=1.8, nstd=0.3,
                                                             value_col='LFQ_intensity')

[13]:

processed_data.head()

[13]:

identifier	subject	sample	group	A2M~P01023	A30~A2MYE2	ABI3BP~Q7Z7G0	ACE~P12821	ACTB~P60709	ACTN1~P12814	ADA2~Q9NZK5	...	VCAM1~P19320	VCL~P18206	VH6DJ~A2N0T4	VIM~P08670	VK3~A2N2F4	VNN1~O95497	VTN~P04004	VWF~P04275	YWHAZ~P63104	scFv~Q65ZC9
0	31	31_C6	Healthy	37.172267	27.313458	25.233156	21.729536	27.979074	24.172188	23.070420	...	25.636164	26.854242	31.263381	24.123099	26.241109	23.096500	32.661954	27.711616	21.272205	28.179259
1	32	32_C7	Healthy	36.897240	28.550101	25.251670	20.671384	26.688458	24.518693	23.557155	...	25.364461	26.409456	31.127802	24.090713	25.906809	23.449222	32.627384	28.778689	22.492604	29.175028
2	33	33_C8	Healthy	37.253761	28.393359	24.360115	19.435843	27.327060	24.788959	22.813820	...	25.679386	27.330393	30.133080	23.291947	26.596194	22.319916	32.676529	28.839721	22.297905	28.177502
3	34	34_C9	Healthy	37.101435	27.986905	25.613204	22.740147	27.323972	24.928996	20.730150	...	25.824324	26.238760	30.460023	23.684959	25.682791	23.034923	32.625970	28.588816	22.040461	28.077144
4	35	35_C10	Healthy	37.169563	28.806458	26.438967	23.431104	26.683380	25.832032	22.667416	...	26.109427	26.766229	30.174624	24.099296	27.743296	24.107891	32.652091	28.482200	22.173561	28.387800

5 rows × 512 columns

Generate Report¶

Once the data for all the different data types has been loaded we can proceed with the statistical analysis and visualization of the results. This is what we define in CKG as generating a Report for each dataset.

To generate these reports, we make use of the functionality in the analytics core. The automated analysis uses the generate_report() function, which uses the configuration in report_manager/config to run the sequence of analysis defined for each dataset (clinical, proteomics). The code would be something like this:

project_report = p.generate_project_info_report()
p.update_report({"Project information": project_report})
for dataset_type in p.data_types:
    dataset = p.get_dataset(dataset_type)
    if dataset is not None:
        dataset.generate_report()

We will however run some of the analyses to showcase how these steps are done and can be easily modified using the available parameters.

Principal Component Analysis (PCA)¶

[14]:

pca_result, args = analytics.run_pca(processed_data, drop_cols=['sample', 'subject'], group='group')

[15]:

args.update({"loadings":15, "title":'PCA plot groups', 'height':600, 'width':700, 'factor':15})
plot = viz.get_pca_plot(pca_result, identifier='pca', args=args)
iplot(plot.figure)

Functional PCA - single sample Gene Set Enrichment Analysis (ssGSEA)¶

We will use the Gene Ontology annotations already extracted when creating the proteomics dataset (dataframe: protein go annotation).

[16]:

annotation = proteomics_dataset.get_dataframe('protein go annotation')
annotation.head()

[16]:

	annotation	identifier	source
0	mitochondrial genome maintenance	TYMP~P19971	UniProt
1	maltose metabolic process	MGAM~O43451	UniProt
2	maltose metabolic process	GAA~P10253	UniProt
3	ribosomal large subunit assembly	RPL11~P62913	UniProt
4	ribosomal large subunit assembly	RPLP0~P05388	UniProt

[17]:

ssgsea_result = analytics.run_ssgsea(data=processed_data, annotation=annotation, annotation_col='annotation',
                                     identifier_col='identifier', set_index=['group', 'sample','subject'],
                                     outdir=None, min_size=10, scale=False, permutations=0)

[18]:

pca_result, args = analytics.run_pca(data=ssgsea_result['nes'], drop_cols=['sample', 'subject'], group='group')

[19]:

args.update({"loadings":15, "title":'Functional PCA plot groups', 'height':600, 'width':700, 'factor':0.3})
plot = viz.get_pca_plot(data=pca_result, identifier='pca', args=args)
iplot(plot.figure)

Differential Regulation¶

[20]:

anova_result = analytics.run_anova(df=processed_data, alpha=0.05,
                                   drop_cols=['sample', 'subject'], subject='subject',
                                   group='group', correction='fdr_bh', is_logged=True)

[21]:

anova_result.head()

[21]:

	identifier	group1	group2	mean(group1)	std(group1)	mean(group2)	std(group2)	posthoc Paired	posthoc Parametric	posthoc T-Statistics	...	FC	efftype	F-statistics	pvalue	padj	correction	rejected	-log10 pvalue	Method	posthoc padj
0	A2M~P01023	Cirrhosis	Healthy	37.813581	0.412603	37.119625	0.227292	False	True	4.658543	...	1.617713	hedges	5.019891	0.002073	0.037675	FDR correction BH	True	3.709005	One-way anova	0.016579
1	A2M~P01023	Cirrhosis	NAFLD+NGT	37.813581	0.412603	37.141632	0.263616	False	True	4.339807	...	1.593223	hedges	5.019891	0.002073	0.037675	FDR correction BH	True	3.403735	One-way anova	0.025276
2	A2M~P01023	Cirrhosis	NAFLD+T2DM	37.813581	0.412603	37.435432	0.592818	False	True	1.655627	...	1.299673	hedges	5.019891	0.002073	0.037675	FDR correction BH	True	0.938823	One-way anova	0.444088
3	A2M~P01023	Cirrhosis	T2DM	37.813581	0.412603	37.280256	0.398108	False	True	2.778815	...	1.447261	hedges	5.019891	0.002073	0.037675	FDR correction BH	True	1.860078	One-way anova	0.112973
4	A2M~P01023	Healthy	NAFLD+NGT	37.119625	0.227292	37.141632	0.263616	False	True	-0.199939	...	0.984861	hedges	5.019891	0.002073	0.037675	FDR correction BH	True	0.073776	One-way anova	0.972450

5 rows × 26 columns

[22]:

args={'alpha':0.05,
      'fc':2,
      'colorscale':'Blues',
      'showscale': False,
      'marker_size':10,
      'num_annotations':480,
      'x_title':'log2FC',
      'y_title':'-log10(pvalue)'}
figures = viz.run_volcano(anova_result, identifier='volcano', args=args)
for figure in figures:
    iplot(figure.figure)

Correlation Analysis¶

[23]:

correlation_result = analytics.run_correlation(processed_data, alpha=0.05,
                                               subject='subject', group='group',
                                               method='spearman', correction='fdr_bh')

[24]:

correlation_result.head()

[24]:

	node1	node2	weight	pvalue	padj	rejected
1	A30~A2MYE2	A2M~P01023	0.314264	0.0	0.0	True
4	ABI3BP~Q7Z7G0	A30~A2MYE2	-0.332827	0.0	0.0	True
8	ACE~P12821	ABI3BP~Q7Z7G0	0.056882	0.0	0.0	True
10	ACTB~P60709	A2M~P01023	-0.033977	0.0	0.0	True
11	ACTB~P60709	A30~A2MYE2	-0.104429	0.00242	0.0057	True

[37]:

network = viz.get_network(correlation_result, identifier="Correlation network",
                         args={'source':'node1', 'target':'node2',
                               'title':'Correlation network', 'values':'weight',
                               'cutoff':0.5, 'cutoff_abs':True, 'color_weight': True,
                               'communities_algorithm': 'louvain'})

[38]:

viz.visualize_notebook_network(network['notebook'], notebook_type='jupyter', layout={})

Functional Enrichment¶

[27]:

enrichment = analytics.run_up_down_regulation_enrichment(anova_result, annotation,
                                                         identifier='identifier', groups=['group1', 'group2'],
                                                         annotation_col='annotation', reject_col='rejected',
                                                         group_col='group', method='fisher',
                                                         correction='fdr_bh', alpha=0.05, lfc_cutoff=1)

C:\Users\sande\.conda\envs\ckgenv\lib\site-packages\pandas\core\frame.py:6692: FutureWarning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

[28]:

figures = viz.get_enrichment_plots(enrichment, identifier='enrichment', args={'width':2200})
for fig in figures:
    iplot(fig.figure)

Knowledge from CKG¶

An option to retrieve relevant knowledge in CKG for the list of significant hits would be to use the functionality annotate_list(), which extracts information from the knowledge graph related to the provided list of proteins (and other entities as well). This result is mildly different results to the automatically generated knowledge report, which also includes results from the analysis of the clinical variables as well as results from the combination of both datasets.

For this, we need to create a Knowledge object and provide the list of proteins we are interested in. Further, the function gives the posibility to provide a list of diseases relevant to the study.

[29]:

kn = knowledge.Knowledge(identifier='NFLD', data=None)

[30]:

sig_hits = list(set(anova_result.loc[anova_result.rejected, "identifier"]))
print(sig_hits)

['HBG2~P69892', 'LGALS3BP~Q08380', 'ALDOB~P05062', 'APOM~O95445', 'PIGR~P01833', 'VTN~P04004', 'TTR~P02766', 'QSOX1~O00391', 'None~A8K1K1', 'PROC~P04070', 'A2M~P01023', 'IGHM~P01871', 'RBP4~P02753', 'LYVE1~Q9Y5Y7', 'ITIH1~P19827', 'V2-13~Q5NV73', 'C1QB~P02746', 'CPN2~P22792', 'IGFBP3~P17936', 'None~A0A120HG46', 'AFM~P43652', 'JCHAIN~P01591', 'ALDH1A1~P00352', 'CLU~P10909', 'VCAM1~P19320', 'IGH@~Q6GMX6', 'COLEC11~Q9BWP8', 'C3~P01024', 'IGFALS~P35858', 'SHBG~P04278', 'GP1BA~P07359', 'CPB2~Q96IY4', 'C6~P13671', 'C7~P10643', 'IGHV5-51~A0A0C4DH38', 'TGFBI~Q15582']

[31]:

kn.annotate_list(query_list=sig_hits,
                 entity_type='protein',
                 queries_file=None,
                 attribute=None,
                 diseases=['cirrhosis', 'non-alcoholic fatty liver disease', 'type 2 diabetes mellitus'],
                 entities=None)

[32]:

kn.graph

[33]:

kn.generate_report(visualizations=['network'], # how to visualize the results (network, sankey)
                   summarize=True, # Whether or not to summarize the annotation
                   method='pagerank', # Method for summarizing the annotation (betweenness, closeness, pagerank)
                   inplace=True) # If True, the summarized graph is saved, otherwise keep full graph

[34]:

kn.report.visualize_report(environment='notebook')[0]

[ ]: